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Hebrew University and Princeton University 

We develop necessary and sufficient conditions for uniqueness of 
the invariant measure of the filtering process associated to an ergodic 
hidden Markov model in a finite or countable state space. These re- 
sults provide a complete solution to a problem posed by Blackwell 
(1957), and subsume earlier partial results due to Kaijser, Kochman 
and Reeds. The proofs of our main results are based on the stability 
theory of nonlinear filters. 

1. Introduction. The interest in the stationary behavior of hidden Markov 
models dates back at least to a 1957 paper by Blackwell [2], who was mo- 
tivated by the following problem from information theory. Suppose that 
(X n ) n >o is a stationary Markov chain which takes values in a finite set. 
The entropy rate of such a chain admits a simple expression in terms of 
its transition probabilities and stationary distribution. The purpose of the 
paper by Blackwell was to obtain a similar expression for the entropy rate 
of the stochastic process Y n = h(X n ), where h is a noninvertible function. 
The latter expression does not involve directly the stationary distribution 
of the process (X n ) n >o, but rather a particular stationary distribution of 
the associated filtering process (7r n ) n >o, which is a measure- valued Markov 
process defined as ir n := P(X n G - \Yi, . . . , Y n ). 

The result of Blackwell raises a natural question: does the filtering pro- 
cess possess a unique stationary measure or, in other words, is the filtering 
process uniquely ergodic? Blackwell conjectured that the filter is uniquely 
ergodic, provided that the underlying Markov chain (X n ) n >Q is irreducible. 
However, as is pointed out by Kaijser [8], one of Blackwell's own counterex- 
amples demonstrates that this conjecture is incorrect. The problem of finding 
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a complete characterization of the unique ergodicity of the filtering process 
has hitherto remained open. The present paper provides one solution to this 
problem (in a more general setting). 

1.1. The contributions of Kaijser, Kochman and Reeds. To our knowl- 
edge, the only direct contributions to the problem studied in this paper are 
contained in Blackwell's 1957 paper [2], in a 1975 paper by Kaijser [8] and 
in two recent papers by Kochman and Reeds [10] and by Kaijser [9], which 
we presently review. 

In the 1975 paper [8], Kaijser observes that the filtering process can be 
expressed as the ratio of two quantities which are defined in terms of prod- 
ucts of random matrices. Therefore, the unique ergodicity problem can be 
studied by means of the Furstenberg-Kesten theory of random matrix prod- 
ucts. Such an analysis leads Kaijser to introduce a certain subrectangularity 
condition on the matrices that define the filter [Condition (K) in Section 6]. 
This rather strong condition is sufficient, but not necessary for unique er- 
godicity. It should be noted that Blackwell's original paper [2] already gives 
a sufficient condition for unique ergodicity, which is, however, even stronger 
than Kaijser's subrectangularity condition. 

In their 2006 paper [10], Kochman and Reeds introduce a weaker sufficient 
condition for unique ergodicity of the filter, which requires that the closure of 
a certain cone of matrices contains an element of rank one [Condition (KR) 
in Section 2.3]. Kochman and Reeds demonstrate by means of an explicit 
computation that Kaijser's condition implies the rank one condition, but a 
counterexample shows that the latter condition is strictly weaker. Besides 
providing a generalization of Kaijser's result, Kochman and Reeds employ 
a different method of proof that is based on a general result in the ergodic 
theory of Markov chains in topological state spaces (which is applied to the 
filtering process). 

Finally, in a recent paper [9], Kaijser presents an extension of the result 
of Kochman and Reeds to hidden Markov models where the underlying 
Markov chain (X n ) n >o takes values in a countable state space. (It should be 
noted that Kochman and Reeds, as well as Kaijser, admit a more general 
observation structure than in Blackwell's original problem.) The extension is 
far from straightforward, as the ergodic theory employed by Kochman and 
Reeds is restricted to Markov chains in locally compact state spaces, while 
the space of probability measures on a countable set is certainly not locally 
compact. A large part of this lengthy paper is taken up with the development 
of a rather specialized ergodic theorem for Markov chains in Polish spaces, 
from which a condition similar in spirit to Kochman and Reeds' rank one 
condition [Condition (Bl) in Section 6] can be derived. 
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1.2. The approach of Kunita and filter stability. Independently from Black- 
well's unique ergodicity problem, a general study of the ergodic theory 
of nonlinear filtering processes was initiated in the seminal 1971 paper of 
Kunita [11]. Kunita studies a somewhat different problem, in continuous 
time and with white noise type observations, but which otherwise bears 
strong similarities to the problem studied by Blackwell. In contrast to the 
approaches developed by Kaijser, Kochman and Reeds, who study the equa- 
tions that define the filter using general methods (products of random ma- 
trices and ergodic theory of Markov chains), Kunita studies the nonlinear 
filter directly through its characterization as a conditional expectation (an 
approach we called intrinsic in [5]). The techniques developed by Kunita are 
in fact extremely general and can be applied also to BlackwelFs problem, 
though this approach has not previously been systematically exploited. 

Kunita characterizes the invariant measures of the filtering process by 
means of the convex ordering. When the signal (X n ) n >o is uniquely ergodic, 
all invariant measures of the filter are sandwiched between two distinguished 
invariant measures which are minimal and maximal with respect to the con- 
vex order, respectively (see Remark 3.2 below for a more precise statement). 
The filter is uniquely ergodic precisely when the minimal and maximal in- 
variant measures coincide. The main result of Kunita's paper claims that 
this is always the case, when the signal is ergodic in a certain sense. Un- 
fortunately, the proof of this result contains a serious gap [1]; indeed, the 
correctness of the proof is already contradicted by the counterexample given 
in Kaijser [8] (see [1, 4] for extensive discussion). 

The gap in Kunita's main result is now largely resolved [14], but under 
an additional nondegeneracy assumption on the observation structure [Con- 
dition (N) of Section 6 in the present setting]. This assumption holds, for 
example, if Y n = h(X n ) + e£ n where £ n is an independent Gaussian random 
variable and e > is an arbitrarily small noise strength, but breaks down in 
the noiseless case e = 0. The nondegeneracy assumption evidently captures 
the phenomenon that observation noise has a stabilizing effect on the filter, 
as is the case in a large number of interesting applications. Unfortunately, 
it is the degenerate case that is chiefly of interest in Blackwell's problem, 
and unique ergodicity turns out to be more delicate in this setting as is 
demonstrated by various counterexamples [1, 8, 10]. 

In recent years, there has been considerable interest in the somewhat 
different problem of filter stability (see the survey [5]). Roughly speaking, 
the filtering process is called stable if n n becomes independent of its initial 
condition ttq as n — > oo in a certain pathwise sense (e.g., as in Theorem 
3.1). However, it is now well established that when the signal (X n ) n >o is 
ergodic, filter stability and unique ergodicity of the filter are essentially 
equivalent properties [4, 6, 12]. In the present setting, this has two important 
consequences. First, filter stability can be used as a tool to study unique 
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ergodicity of the filter, a fact that is heavily exploited in this paper. Second, 
previous work on the filter stability problem provides a set of sufficient 
conditions for Blackwell's unique ergodicity problem which are distinct from 
those proposed by Kaijser, Kochman and Reeds. 

f .3. Contributions of this paper. The present paper was inspired by the 
observation that the conditions of Kochman and Reeds [10] and Kaijser [9] 
are reminiscent of the filter stability property, albeit along a single sample 
path. It is therefore a natural step to make the connection with filter stabil- 
ity theory and Kunita's ergodic theory. Our results demonstrate that this 
approach is both natural and fruitful. 

Our first main result, Theorem 2.6, establishes that a certain Condition 
(C) is necessary and sufficient for unique ergodicity of the filter in the case 
where X n and Y n both take values in an (at most) countable state space. It 
is easily shown, as we do in Section 6, that the sufficient conditions given 
in Kaijser 's recent paper [9] imply Condition (C). It should be noted that 
the proof of Theorem 2.6 is surprisingly easy and natural — that is, provided 
the connection between filter stability and Kunita's ergodic theory (given in 
Theorem 3.1) is taken for granted. 

Our second main result, Theorem 2.7, shows that the rank one Condition 
(KR) of Kochman and Reeds is necessary and sufficient for unique ergod- 
icity of the filter in the case where X n takes values in a finite state space. 
Sufficiency was already proved by Kochman and Reeds [10], though we give 
here an entirely different proof of this fact by showing that Condition (KR) 
implies Condition (C). The necessity of Condition (KR) is new, and answers 
in the affirmative the question posed on the last page of [10]. Thus the ne- 
cessity and sufficiency of Condition (KR) provides a complete solution to 
the original problem posed by Blackwell [2]. 

Our main results subsume all of the sufficient conditions introduced in 
the papers of Kaijser, Kochman and Reeds. In addition, we discuss in Sec- 
tion 6 some sufficient conditions of a different nature which are inherited 
from results in the filter stability literature. Though these conditions are 
not necessary, they may be substantially easier to check in practice than 
Condition (C) or (KR). Moreover, such conditions remain of independent 
interest, as we were not able to verify by an explicit computation that they 
imply Condition (C) or (KR) (of course, this implication follows indirectly 
from the necessity of these conditions). 

1.4. Organization of the paper. The remainder of this paper is organized 
as follows. In Section 2 we introduce the basic hidden Markov model, and 
we fix once and for all the notation and standing assumptions that are 
presumed to be in force throughout the paper. We also state our main results, 
Theorems 2.6 and 2.7. In Section 3, we introduce the connection between 
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filter stability and unique ergodicity of the filter. The main result of this 
section, Theorem 3.1, adapts the necessary theory to the setting of this 
paper and forms the foundation for the proofs of our main results. Section 
4 is devoted to the proof of Theorem 2.6, while Section 5 is devoted to the 
proof of Theorem 2.7. Section 6 develops various sufficient conditions for 
unique ergodicity within the setting of this paper. Finally, the Appendix is 
devoted to the proofs of various results that were omitted from the body of 
the paper. 

2. Preliminaries and main results. 

2.1. The canonical setup and standing assumptions. Throughout this pa- 
per, we operate in the following setup. We consider the stochastic process 
(X n , Y n ) ne z, where X n takes values in the state space E, and Y n takes values 
in the state space F. We will always presume that the following assumptions 
are in force: 

• E is either finite (E = {1, . . . ,p}) or countable (E = N). 

• F is a Polish space [endowed with its Borel <7-field B(F)]. 

We realize the stochastic process (X n ,Y n ) n ^z on the canonical path space 
ft = n x x U Y with n x = E z and ft y = F z , such that X n (x,y) = x(n) and 
Y n (x,y) =y(n). Denote by F the Borel cr-field on ft, and introduce the 
er-fields 

F*,n = a i x k •■ke[m, n]}, F Y n = a{Y k : k G [m, n) } 

for m, n € Z, m < n. The cr-fields n , F X j00 , etc., are defined in the usual 
fashion (e.g., n = \J ' m < n F x n ). For future reference, we define 

Gm,n = F_ oo m V F_ oo n , G-oo,n 

(note that n C G-oo,m a fact that will be used frequently in the follow- 
ing). Finally, the shift G : Q — > is defined as @(x,y)(m) = (x(m + l),y(m + 
!))■ 

We now define a probability measure on (f2, F) under which (X^, lfc)fcez is 
a hidden Markov model. Our model is specified by the following ingredients: 

(1) A fj-finite reference measure <p on F. 

(2) A nonnegative matrix function M :F — > R^ xS such that 

SU P / j Mij(y) < oo for ip-a.e. y € F, 



1 Grn,n 
m<n 
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and such that the matrix 



P — (Pij)i,jeE, 




M i:j {y)ip{dy) 



defines the transition matrix of an irreducible and positive recurrent 
(but not necessarily aperiodic) Markov chain in the state space E. 

As P is irreducible and positive recurrent, there is a unique probability 
measure A on E that is invariant XP = X (as is usual, we identify measures 
and functions on a countable space with row and column vectors, respec- 
tively). A standard extension argument allows us to construct a probability 
measure P on (O, J 7 ) under which {X^^YkjkeZ is a stationary Markov chain 
with transition probabilities 



for i,j G E, y G F, A G 13(F). It should be noted that under P, the process 
(Xk)kez is a stationary Markov chain with transition matrix P, and (Yfc)fcgz 
are conditionally independent given (Xk)k<=z- This is precisely the defining 
property of a hidden Markov model. The process (Xk)kez represents an 
unobserved or "hidden" signal process, while (Yfc)fcez is the observation pro- 
cess. The canonical probability space (Q, J 7 , P) thus constructed will remain 
fixed throughout the paper. 

Remark 2.1. A hidden Markov model is often assumed to satisfy the 
additional assumption that is a (noisy) function of X^ only. In this case, 
one can factor Mij(y) = PijRj(y), where Rj(y) is the density of P(Yj. G 
■|Afc = j) with respect to (p. In the present setting, the conditional law of 
Y/% can depend on both X^ and X^-i- The generalization afforded by this 
model is minor, but allows us to include the partitioned transition matrices 
of [9, 10] as a special case. 

Remark 2.2. The boundedness condition sup iGB YljeE Mij(y) < oo a.e. 
is automatically satisfied in the following cases: 

• When E is a finite set, the condition holds trivially. 

• When E is countable and F is at most countable, the condition al- 
ways holds. Indeed, note that in this case ^2 ye p X^jgE Mij(y)tp({y}) = 

Y.j & E p ij = !> so that su VieEY.jeE M ij(y) ^ ^(Mr 1 < 00 for ^" a - e - 

y€F. 

The significance of this assumption is that it ensures the Feller property of 
the filter. 
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For any Polish space S we denote by B(S) the Borel cr-field of S, by 
V(S) the space of probability measures on S, and by C&(S) the space of 
bounded continuous functions on 5. We will always endow V(S) with the 
topology of weak convergence of probability measures [recall that V(S) is 
then itself Polish], and we write // n // if the sequence (// n ) C V{S) con- 
verges weakly to fi G V(S). The total variation distance between probability 
measures fJ,,f € V{S) is defined as 

/d//- / fdv 



||// — i/|| = sup 

ll/l|oo<l 

Finally, let us recall that as E is at most countable and P is irreducible, 
the invariant measure A must charge every point of E. Therefore // -C A for 
every // 6 V(E), and we can define the probability measures on (0, J 7 ) 
as 

The restriction of P M to ^foo V oo defines a hidden Markov model with 
the same transition probabilities as under P, but with the initial distribution 
Xq ~ //. If the initial distribution is a point mass on x € E, we will write P x 
instead of P Sx . 

2.2. Nonlinear filtering. The purpose of nonlinear filtering is to compute 
the conditional distribution of the hidden signal given the available observa- 
tions. In this paper we will encounter several variants of the nonlinear filter, 
defined as follows: 

vr n = P(X„ G -\Tl n ), < = P^(X n G -| J% n ), < = P*(X n G -| j£j 
for n G Z + , // G V(E), x £ E (here 7To = A, 7Tq = // and 7Tq = £ x ) and 

7r.™ n = P(X n G • {FIoo^), 7T™ ax = P(X„ G • \G-oo,n) 

for nSZ. Though the relevance of 7r™ m and vr™ ax may not be entirely evident 
at present, their role will be clarified in Section 3 below. 

The following elementary results are essentially known; short proofs are 
provided in Appendix A.l for the reader's convenience. 

Lemma 2.3 (Filtering recursion). For any m^n^i'L, n> m we have P- 
a.s. 

in ^M{Y m+l )---M(Y n ) _ vrr x M(y m+1 ) • • • M(Y n ) 



7r- in M(y m+1 ).--M(K„)l' n ^ ax M(y m+ i)---M(y„)i' 
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The recursion for 7r n ,iTn is obtained by choosing /j, = A or fj> = 5 X , respectively. 
It should be noted that 7r„ is defined only up to a P^-null set. Indeed, 



that is, fiM(yi) ■ ■ ■ M(y n )l is the density of the law of (Yi, . . . ,Y n ) under 
P M . Similarly, it is easily seen that 7TmM(y m+ i) ■ ■ ■ M(y n )l is the density of 
the law of (Y m+ i, . . . ,Y n ) under the conditional measure P M (-|Yb, . . . ,Y m ). 
Therefore, the denominator in the filtering recursion can only vanish on a 
P^-null set. Similar considerations hold for 7r™ m ,7r™ ax , which are defined up 
to a P-null set. 

Lemma 2.4 (Markov property). (7r™ m ) neZ; (vr™ 2 ™) 

n g2 are stationary 

V(E) -valued Markov chains under P, whose transition kernel V\ is defined 



Similarly, (iTn)n£Z + is a Markov chain under P^ with transition kernel V\. 

Remark 2.5. As (7r™ m ) ne z, (7r™ ax ) ng z are stationary Markov chains 
with transition kernel the laws of ttq 1 ^ and tt™ 111 must be invariant for 
n. Therefore, the filter always possesses at least one invariant measure. 

2.3. Main results. This paper aims to resolve the following question: 
when does the filter possess a unique invariant measure, that is, when does 
the equation Mil = M possess a unique solution M G V(V(E))? 

We begin by establishing a general sufficient condition for unique ergod- 
icity, which is also necessary when the observation state space F is at most 
countable. 

Condition (C). For every e > 0, there exist an integer JVeN and 
subsets S C V{E) and O C F N such that the following hold: 




by 





(1) P(^ in € S and vr^ ax G S) > and ip® N {O) > 0. 

(2) fxM (yi) • • • M(y N )l > for all n G S and (yi,...,y N )e O. 

(3) For all n,v G <S and (y\, . . . , y^) G O 



fiM (yi)---M{y N ) vM (yi ) ■ ■ ■ M (y N ) 
nM( yi ) ■ ■ ■ M(y N )l uM( Vl ) ■ ■ • M{y N )l 



< e. 
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Theorem 2.6. Suppose that Condition (C) holds. Then the filter admits 
a unique invariant measure M, and we have n~ l Ylk=i Mon fc =^> M as n ^ oo 
for any Mq G V(V(E)). If, in addition, the signal transition matrix P is 
aperiodic, then we have Mol - !™ M as n — > oo for any Mq G V(V(E)). 

Conversely, suppose that the observation state space F is a finite or count- 
able set, and that the filter is uniquely ergodic. Then Condition (C) holds. 

The proof of this result is given in Section 4. 

Next, we consider the following condition, due to Kochman and Reeds 
[10], for the case where the signal state space E is a finite set. 

Condition (KR). Let E be a finite set, and define the cone of matrices 
K = {cM{ yi ) ■ ■ ■ M (y n ) : n G N, Vl , . . . , y n G F, c G K+}. 
Then the closure cl/C contains a matrix of rank 1. 

Kochman and Reeds prove that this condition is sufficient for uniqueness 
of the invariant measure of the filter (in [10] , both E and F are presumed 
to be finite). The following result shows that Condition (KR) is in fact 
equivalent to unique ergodicity of the filter, as well as to Condition (C) 
above, when the signal state space is a finite set. This provides a complete 
solution to a problem posed by Blackwell [2], and answers in the affirmative 
the question posed at the end of [10]. 

Theorem 2.7. Suppose E is a finite set and that one of the following 
hold: 

• F is a finite or countable set, and (p is the counting measure; or 

• F = M. d , ip is the Lebesgue measure, and y^-M{y) is continuous. 

Then the following are equivalent: 

(1) The filter admits a unique invariant measure M. 

(2) Condition (KR) holds. 

(3) Condition (C) holds. 

When any of these conditions hold, we have n~ l X^fc=i Mol~l fe M as n — > oo 
for any Mq G V(P(E)). If, in addition, the signal transition matrix P is 
aperiodic, then we have MolT 1 => M as n—> oo for any Mq G V(V(E)). 

The proof will be given in Section 5. 

Finally, various sufficient conditions for unique ergodicity of the filter 
were given by Kaijser [8, 9]. These conditions are easily shown to imply 
Condition (C), as is discussed in Section 6. We therefore reproduce Kaijser 's 
results using a much simpler proof. Similarly, various conditions that have 
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been introduced in the context of filter stability [5, 14, 15] are shown in 
Section 6 to imply unique ergodicity of the filter. None of the latter sufficient 
conditions is also necessary; however, when they apply, they are often easier 
to check than Condition (C) or (KR). 

3. Ergodic theory and stability of nonlinear filters. The proofs of our 
main results are based on a general circle of ideas connecting the ergodic 
theory [11, 12] and asymptotic stability [5, 14] of nonlinear filters. Indeed, it 
is by now well established [4, 6] that unique ergodicity and stability of the 
filter are essentially equivalent properties. The purpose of this section is to 
introduce the relevant results in this direction that will be needed in what 
follows. Though the results in this section are adapted to the setting of this 
paper, their proofs largely follow along the lines of [6, 11, 12, 14]. We have 
therefore relegated the proofs to the Appendix. 

The following characterization will be of central importance. 

Theorem 3.1. Consider the following conditions: 

(1) The filter possesses a unique invariant measure M S V{V{E)). 

(2) tt^ x = 7T^' m P-a.s. 

(3) \\-7Tn — 7r^|| — > as n — > oo P^-a.s. whenever fi<^v. 

(4) n-^LiMon^M as n -too for any M €V(V(E)). 

(5) M n n =>- M asn^oo for any M € V(V(E)). 

Conditions 1-4 are equivalent. If, in addition, the signal transition matrix 
P is aperiodic, then conditions 1-5 are equivalent. 

The proof is given in Appendix A. 2. 

Remark 3.2. Condition 1 is the desired unique ergodicity property of 
the filter. Condition 3 is the filter stability property. Conditions 4 and 5 
characterize the convergence of the law of the filter to the invariant measure. 

Condition 2 in Theorem 3.1 stems from an ingenious device introduced 
by Kunita in the seminal paper [11] and used in the proof of Theorem 3.1. 
By Lemma 2.4, « in ) neZ and (vrjf ax ) 

n ,gz are stationary Markov processes. 
Therefore, the laws M max , M min G V{V{E)) of the P(-B)-valued random vari- 
ables 7r™ ax ,7r™ m are invariant for the filter transition kernel n. Kunita shows 
that any invariant measure M for l~l is sandwiched between M max and M min 
in the sense that 

J /(^)M min (^)< J /(/i)M(^)< J f(p)M maiX (dfi) 

for every convex function / E Cb(V(E)). In other words, within the family of 
fl-invariant measures, M min is minimal and M max is maximal with respect 
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to the convex ordering. The identity 7r™ ax = vr™ m ensures that the maximal 
and minimal invariant measures are identical, so that there can be only one 
invariant measure. 

Example 3.3. Some intuition may be obtained from the following sim- 
ple example [10], which is a typical case where the filter fails to be uniquely 
ergodic. Let E = F = {0, 1} (endowed with the counting measure), and let 

= *(i, = (^ 

Note that P = M(Q) + M(1) is irreducible and aperiodic with invariant mea- 
sure A = Ai = 1/2, and Y k = I{x k _ 1= x k } for all k > 1. 

As (lfc)fc>i reveals exactly when the transitions of (Xk)k>o occur, we evi- 
dently have Xf, € a{X m ,Y m+ i,Y m+ 2, ■ ■ ■ , Y n } for every m < k < n. It follows 
that Qm,n = F-oo n f° r every m < n, so that in particular G~oo,n = F-oon- 
Therefore 

On the other hand, it follows immediately from the filtering recursion that 
7r n = A for all n. It is therefore not difficult to show that 7r™ m = A also, so 
that 

7T„ =A, M =dx = d( So+Sl y 2 . 

With a little more work, one can show that any invariant measure M is of 
the form 

M = [ 1/2 W(iz!ML±VfM2±^ m(de) 
Jo 2 

for some probability measure m on [0, 1 /2] . It is easily seen that any such 
M does indeed lie between M max and M mm in the convex ordering. 

Besides the characterization of unique ergodicity in Theorem 3.1, we will 
require the following convergence property which holds regardless of unique 
ergodicity. 

Lemma 3.4. lim^oo ||7r™ ax — vr™ m || exists P- a. s. 

The proof of this result is also given in Appendix A. 2. Its relevance is due 
to the following observation. In order to prove -n - ™^ = vr™ 111 (hence unique er- 
godicity by Theorem 3.1), it suffices to show that lim n _ i>00 ||vr™ ax — 7r™ in || = 
P-a.s., as (7r™ m ) ne z and (7r™ ax ) n6 g are stationary processes. But by virtue 
of Lemma 3.4, it then suffices to show only that ||vr™ ax — vr™ in || converges to 
zero along a sequence of stopping times. The main idea behind the proof of 
Theorem 2.6 is that Condition (C) allows us to construct explicitly such a 
sequence stopping times. 
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4. Proof of Theorem 2.6. 

4.1. Sufficiency of Condition (C). We will need the following lemma. 

Lemma 4.1. The sequence (Xk,Yk)kez is ergodic under P. 

Proof. As the signal transition matrix P is presumed to be irreducible 
and positive recurrent, it is easily established that the pair (Xk,Yk)k<=z is a 
Markov process that possesses a unique invariant measure. This measure is 
therefore trivially an extreme point of the set of invariant measures, hence 
ergodic. □ 

We will also use the following simple result. 

Lemma 4.2. Let the set S C ViE) and O C F N be as in Condition (C). 
Then we have Y{-nf n G S, vr^ ax G S, (Y h . . . , Y N ) G O) > 0. 

Proof. Let us write for simplicity Y = (Y±, . . . , Yn). As 7Tq 1u1 and 7r™ ax 
are {y-o^o-measurable by construction, we have 

P(vrS lin GcS,vrS iax G5,yeO) 

= E(/ <s (vr min )/ <s (^ max )P(y G O|g_oo,o)) 

= E (/ 5 (^ min )/ <s (^ max ) J itf**M(yi) ■ ■ ■ M{y N )Mdy x ) ■ ■ ■ ^{dy N )^ . 

It is now easily seen that Condition (C) implies the result. □ 

We now proceed with the proof of the sufficiency part of Theorem 2.6. 
Suppose that Condition (C) holds, and fix an arbitrary decreasing sequence 
e k \ 0. Then for every k we can find N fc G N, S k C V(E), and O k C F Nk 
such that the properties 1-3 of Condition (C) are satisfied. Define the events 

A n . k = {7T™ m G 5fc,7T™ aX G Sk, (Y n+ i, . . .,Y n+Nk ) G Ok}- 

Then, by the stationarity of (X n , Y n , 7r™ m , 7r™ ax ) ne z (Lemma A.l), we have 

T T 

lim -V/a h = lim - V{i"i„ h o @ n \ = P(A k ) > P-a.s., 

n=l n=l 

where we have used Birkhoff's ergodic theorem together with Lemmas 4.1 
and 4.2. Thus, for any k, the event A n ^ occurs at a positive rate, so that 
certainly 
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Now define the stopping times tq = and 

T k = min{n > r fc _! : G 5 fc , <^ fc G «S fe , (F n _^ + i, . . . , Y n ) G O k } 

for any A; > 1. It follows directly that 

/oo \ 

P(rfc < oo for all k) > P O limsupA^j. = 1. 

Vfc=i n ^°° ' / 

Moreover, by Condition (C) and Lemma 2.3, we have 

^ N M{Y Tk _ Nk+1 )...M{Y Tk ) 



I max „min I 



^ N M{Y Tk . Nk+1 )...M{Y Tk )l 
^ N M{Y Tk . Nk+1 )-..M{Y Tk ) 



n™» N M{Y Tk _ Nk+1 )...M{Y Tk )l 



for all k > 1 P-a.s. Therefore, Lemma 3.4 shows that ||7r™ ax — vr™ m || -4 as 
n — > oo P-a.s. But using the stationarity of (vr™ m , 7r™ ax ) ne z (Lemma A.l) 
and the dominated convergence theorem, we find that 



E/M^-max „mmii\ "cVH^max „min || \ ""^9° n 
(IFO -^0 ID= E (lFn -TTn ||) ► 0, 

so that evidently 7rQ iax = vr™ 111 P-a.s. The sufficiency part of Theorem 2.6 
now follows immediately from Theorem 3.1. 

4.2. Necessity of Condition (C). Throughout this subsection, we assume 
that the observation state space F is finite or countable, and that the filter 
possesses a unique invariant measure. We aim to show that Condition (C) 
must hold. 

Denote by M the law of 7r™ m . Thus M is invariant, hence the unique 
invariant measure of the filter. Fix an arbitrary state i G E, and note that 

E((*Oi) = *i > implies P((^ in )i > A,/2) > 0. 

Therefore, writing K = {fi G V{E)\m > Aj/2}, we have M(K) > 0. We can 
thus define the probability measure M^(-) := M(- n 1Z)/M(1Z). 

Now note that, by Theorem 3.1, we have \\7r l n — 7r^|| — > P l -a.s. for all 
fi G 1Z. In particular, the set of points ((yk)keN, m) £ x T 3 ^) such that 



^M( yi )---M(y re ) fiM( yi )---M(y n ) 



5iM{ yi ) ■ ■■M{y n )l fiM(yi) ■ ■ ■ M(y n )l 



n— »oo g 



has P* tgi M^-full measure. It follows that for P l -a.e. path (yfc)fceN; the above 
convergence holds for M-^-a.e. fi. Therefore, as F is at most countable [so 
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the law of (Y\, . . . , Y n ) is atomic for all n < oo], we can certainly find a single 
sequence (yk)k<=N with P*(Yi = y±, . . . ,Y n = y n ) > for all n < oo such that 



<5iM(yi)---M(y n ) iiM(y 1 )---M{y n ) 



for M-^-a.e. \i. 



diMim) ■ ■ ■ M{y n )l //M(yi) • • • M{y n )l 
By Egorov's theorem, there is a subset 5 C 1Z with _M-fc(<S) > such that 
<5;M(jh)---M(y n ) /iM(y!)---M(^ 



sup 



^M(yx) • • • M(y n )l fiM(yi) ■ ■ ■ M(y n )l 



0. 



We are now in the position to show that Condition (C) holds true. Given 
e > 0, we first choose the integer N £ N large enough so that 



sup 



5iM (yi) ■ ■ ■ M{y N ) fj,M (m) ■ ■ ■ M{y N ) 



8iM{y{) ■ ■ -M{y N )l /iM(yi) • ■ -M{y N )l 



< 



We let S be as above and define the singleton O = {(yi, ■ ■ ■ , ]Jn)}- By The- 



orem 3.1, we have 7r ( 



max 




ttq 1111 and therefore 



P(vrr x G 5 and <" n 



• o to a nu ■«„ G S) = P(7r^ m G5) = M(5)> M(^)M^(5) > 0. 

Next, we note that as P l (Yi = y\,..., Yn = iin) > 0, 

P i ((y 1 , ...,Y N )eO) = 5iM (yi) • • • M(y JV )l^({yi}) • • • ^(fev}) > 0, 

so tp 9N (0) > and AtM(yi) • • • M(y N )l > for all p G 5 (this holds by the 
definition of TZ and as S C 1Z). Finally, by the triangle inequality, 



sup 



yLM{y{)---M{y N ) vM{y{) ■ ■ ■ M{y N ) 



Thus Condition (C) is satisfied, and the proof is complete. 



5. Proof of Theorem 2.7. The implication 3 1 is already established 
by Theorem 2.6. It therefore suffices to prove the implications 1 =>• 2 and 
2^3. 



5.1. Proof of 1 => 2. We will need the following lemma. 



Lemma 5.1. Let (3, X, (X n ) n eNi ^) ^ e a filtered probability space, and 
let Q, Q' 6e mutually singular probability measures on S. Suppose that Q, Q' 
are locally absolutely continuous with respect to i/iai is, Q|;t? n <C 3?|;t? n 
and Q'Ia'u ^ ™^ densities q n and q' n , respectively. Then q n /q' n — > 

as n — > oo Q'-a.s. 
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Proof. Let $ = (<& + Q + Q')/3, and let r n be the density of <&\x n with 
respect to <&|;f n - Then we have q n r n — > dQ/d& and q' n r n — > dQ' /d& <l>-a.s., 
hence Q'-a.s. as Q' < But dQ/d& = Q'-a.s. and dQ'/d& > Q'-a.s. 
by the mutual singularity of Q and Q'. The claim follows directly □ 

We consider the finite state space E = {1, . . . ,p}. Let us write 

Ql,...,QpCF N , fi^SUppP^jrY- . 

There exists a finite partition {A±, . . . , Ak} of such that a{Ai, . . . , A^} = 
c{Oi, . . . , We may assume without loss of generality that 

P i ((F*)fceNG^i)>0 for i = l,...,q, 
P i ((Y k ) km eA 1 ) = for i = q + l,..., p 

for some q G E (this can always be accomplished by relabeling the points of 
the state space). Define P(-) = P 1 (-|(^fc)fc6N £ Then, by construction, 
P|-ry <P*|-ry for i = 1, . . . ,q and P\ty _LP l |x-y for i = q + 1, . . . ,p. 

l,oo l,oo l,oo l,oo 

We assume that the filter is uniquely ergodic, so that \\tt^ — Tr n \\ — > P x - 
a.s. for every x € E by Theorem 3.1 (this follows as A charges all points in 
E, so we certainly have 8 X -C A for all x £ E). Therefore, we find that 

lim ||7rf — 7r n || = for x = 1, . . . , q, P-a.s. 

n— >oo 

Denote by q % n the density of P l ((Y\, . . . ,Y n ) G •) with respect to ip® n , and 
similarly denote by q n the density of P((Yi, . . . ,Y n ) G •) with respect to ip® n . 
Then 

Qn ~ 

q n , q* > for all n G N, x = 1, . . . , q and lim < oo P-a.s. 

(the latter follows as q n jq\ is a uniformly integrable martingale under P 1 ), 
while 

lim — = for i = q + 1, . . . ,p, P-a.s. 

rwoo q n 

by Lemma 5.1. (The fact that ip may be any cr-finite measure does not 
preclude the application of Lemma 5.1, as ip can always be transformed into 
a probability measure by means of an equivalent change of measure.) 

As all of the above statements hold P-a.s., we can certainly find one 
sample path {yk)k&i on which all these statements hold simultaneously. In 
particular, we have 

«5 a M(yi)---M(y n ) AM(gi) ; ; ; M(y n ) 
8 x M{yx) ■ ■ • M (y n )l \M(yi) ■ ■ ■ M(y n )l 



for x = l. 



,q, 
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as well as 

SiM^-'-Mivn)! g*(jfi,...,y„) n- 



Now define the matrix norm ||M 







for i 



+ l,...,p. 



:SU P||/||oo<i su P||mIIi<i^ M ^- As the set 
of matrices of unit norm is compact (as we are in the finite-dimensional 

setting) , there must be a subsequence ilk /* oo and a matrix such that 
M(y 1 )---M(y nh ) k- 



\M n 



\\M{y l )---M{y nk )\\ 
We claim that is a rank 1 matrix. Indeed, for i = q + 1, . 



, p we have 



lim 



||^M(yi)---M(y nfc )|| ijAf(yi)-M(ij nt )l 
||M(yi) • • • M(y nk )\\ ~ fe^oc ^M(yi) • • • M(y nk )l 

On the other hand, consider a state x G {1, . . . ,q} such that ||5 X M 
Then <5 x Mool = H^MooH > 0, and thus also AM^l > 0. But then 



0. 

> 0. 



S x M n 



AM„ 



lim 

k— >oo 



<^M(yi)---M(y„J 



XM(yx)---M(y nk ) 



)1 



0. 



^MCift) • ■ -M{y nh )1 AM(yi) • ■•M{$ nh) 

Therefore, we have shown that for every j = 1, . . . ,p, the jth row of M m 
is either zero, or a multiple of the row vector AAfoo. Moreover, M m is not 
identically zero as HMooH = 1. Thus is a rank 1 matrix, and Condition 
(KR) follows. 

5.2. Proof o/2=^3. We assume that Condition (KR) holds. Therefore, 
there exists a nonnegative column vector u (which is not identically zero), 
and a probability measure g, such that the rank 1 matrix ug is in the closure 
of the cone C. In particular, for any 5 > 0, we can choose N £ N, y±, . . . , yN G 
F, c > such that 

\\cM( yi ) •• -M{y N ) - ug\\ < S. 
Let a > (to be chosen below), and define the set 

S = {fj,eV(E):fj,u>a}. 

Then we can estimate 

fiM( yi )---M{y N ) 



sup 

Ates 



fiM( yi )---M(y N )l 
In particular, by the triangle inequality, 

i xM{y 1 )---M{y N ) vM( yi ) 



< 



26 



a 



sup 



M(y N ) 



nM( yi )---M(y N )l uM{ yi )---M{y N )l 



< 



48 



a 
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Moreover, note that 

c/xM(yi) • • • M(y N )l > nn - \\cM{yi) ■ ■ ■ M(y N ) - ug\\ >a-5 
for all u G S. 

We aim to show that Condition (C) is satisfied. To this end, let e > 
be given (and e < 1 without loss of generality). As An > 0, we may choose 
a = An/2 and 5 = ae/4. Now choose N G N, y\, . . . , yx G F, c> as above. 
When F is at most countable, the above choices of N and S, together with 
the singleton O = {(yi, ■ ■ ■ ,Vn)}, satisfy properties 1-3 of Condition (C). 
Indeed, properties 2 and 3 are immediate from the above computations. To 
prove property 1, note that 

P(^ in G S and tt^ x G S) 

> P(^ ax u > 7rf n u > An/2) 

= E(P K max n > 7r min u|J-r o0)0 )/]A u /2,oo [ ( 7 ro mi ^))- 
As trivially P(X > E(A)) > for any random variable X, we have P-a.s. 

P/„max„, ^ „min„,| -rY \ TD/'„max„, ^ "ci/'„max„,| -r-Y \\ -rY \ ~ n 
(tt n>vr ul^L^o) = P(vr u > E(vr ul^l^oJl^-^o) > 0, 

while P(vrJ ) nin n > An/2) > by virtue of the fact that E(7rjf in n) = An. There- 
fore P(7TQ im G S and ttq 1 ^ G S) > 0, and the claim is established. 

In the case where F = M. d , we cannot choose O to be a singleton as this 
set has Lebesgue measure zero. However, note that by the assumed continu- 
ity, all the above computations extend to a sufficiently small neighborhood 
of the path (y±, . . . , yw). Choosing O to be such a neighborhood, we have 
(p® N (O) > by construction, and the remainder of the proof proceeds as in 
the countable case. 

6. Sufficient conditions. Our main results, Theorems 2.6 and 2.7, estab- 
lish necessary and sufficient conditions for unique ergodicity of the filter. The 
purpose of this section is to discuss various sufficient conditions that have 
appeared in the literature, and their relations to our main results. First, we 
discuss the sufficient conditions introduced by Kaijser [8, 9] and show how 
these can be obtained directly from our Theorem 2.6. Then, we discuss vari- 
ous conditions that have been introduced in the context of the filter stability 
problem [5, 14, 15]. 

6.1. Kaijser's sufficient conditions. In Kaijser's 1975 paper [8], the fol- 
lowing condition is shown to be sufficient for unique ergodicity of the filter. 

Condition (K). Let E and F be finite sets, let ip be the counting 
measure on F and let the signal transition matrix P be aperiodic. There 
exist j/i, . . . ,y n G F such that the matrix M = M(y\) ■ ■ ■ M(y n ) is nonzero 
and subrectangular, that is, Mjj > and > imply Mu > and M^j > 0. 
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Kaijser's proof of sufficiency is based on the Furstenberg-Kesten theory of 
products of random matrices. A much simpler proof was given by Kochman 
and Reeds in [10], Section 5, where Condition (K) is shown to imply Con- 
dition (KR) through an explicit computation. Kochman and Reeds prove 
the sufficiency of Condition (KR) by invoking a general result in the ergodic 
theory of Markov chains in topological state spaces. We would argue that 
the proof of sufficiency given here is even simpler, at least if one takes for 
granted the (essentially known) characterization of unique ergodicity of the 
filter provided by Theorem 3.1. 

Kaijser showed already in [8] by means of a counterexample that the 
subrectangularity condition cannot be dropped, that is, that irreducibility 
and aperiodicity of the signal need not imply unique ergodicity of the fil- 
ter. Kochman and Reeds provide two further counterexamples [10]. They 
demonstrate that the assumption of aperiodicity cannot be dropped in Con- 
dition (K), that is, that subrectangularity and irreducibility need not imply 
unique ergodicity of the filter. Moreover, they provide a counterexample 
where Condition (KR) is satisfied and the signal is irreducible and aperi- 
odic, but Condition (K) is not satisfied. Theorem 2.7 in this paper completes 
these results by establishing the necessity of Condition (KR) . 

In a recent paper, Kaijser [9] introduces two sufficient conditions for 
unique ergodicity of the filter in the case where E and F are countable. 



Condition (Bl). Let E and F be countable, and let <p be the counting 
measure. There exists a nonnegative function u:E—>M + with ||u||oo = 1, a 
probability measure g on E, a sequence of integers (rik)keN an d a sequence of 



observation paths (yf , . . . ,y„ fe )fc G N with \\M{yf 



5 x M(y1)---M(y k nk 



\\M(y^...M(y^)\\ 
(Here we have defined the norm \\M 



u(x)g 



M {Vn k )\\ > °> such that 



for all x G E. 



su P||/lloc<i su P|Ni<i/^ M /- 



Condition (B). Let E and F be countable, and let <p be the counting 
measure. For every /3 > 0, there exists an xq G E such that the following 
holds: given any tight set Tc V(E) such that, for any Mo G V(V(E)) with 
/ vM (du) = A, 

Mo(Tn {v £ V(E) : v Xo > X xo /2}) > A xo /3, 
there exist JVgN and yi, . . . ,yjv G F such that 6 Xo M(yi) ■ ■ ■ M(yjv)l > and 
HM (yi) • • • M(y N ) 5 X0 M( yi ) ■ ■ ■ M(y N ) 



fiM( yi ) ■ -'M(y N )l S X0 M( yi ) ■ • ■ M (yjv)l 
for all fi G T n {v € V[E) : u Xo > X X J2}. 
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Kaijser shows that either of these conditions implies unique ergodicity 
of the filter, provided the signal transition matrix P is aperiodic. Kaijser's 
proof is very long and requires the development of some dedicated ergodicity 
results for Markov chains in nonlocally compact spaces. We will presently 
show that Condition (Bl) and Condition (B) imply our Condition (C), so 
that Kaijser's results follow easily from Theorem 2.6 (even in the case where 
P is not aperiodic). 

Lemma 6.1. Condition (Bl) implies Condition (C). 

Proof. Suppose that Condition (Bl) holds. We can estimate 



^M(y k )---M(y, 



\\M(y k )---M(y k )\\ 



fmg 



< 



5 x M(y k )---M(y, 



k - 



J 

x=l 



\\M(y k )---M(y k k )\\ 
5 x M{y k )---M{y k 



u(x)g 



\\M(y k )---M(y k )\\ 



u(x)g 



+ 2 

X = J+1 



Let T C V{E) be a tight set. Then the first term converges to zero uniformly 
in n € T by assumption, while the second term can be made arbitrarily small 
uniformly in /i€ T by choosing J sufficiently large. Therefore, 

fxM(y k )---M(y k 



sup 



\\M(y k )---M(y k )\\ 



jlUQ 



<5 



for any tight set T C V{E), 5 > 0, and k sufficiently large. Let a > and 
define 



S = Tn{n£V(E):[m>a}. 



Then we obtain 



sup 



liM{y k )--.M(y k ) uM(y k ) ■ ■ ■ M(y k ) 



f,M(y k ) ■ ■ ■ M(y k k )l uM(y k ) ■ • ■ M(y k k )l 



< 



46 



o 



We now show that Condition (C) is satisfied. Let e > be given, and choose 
a = Xu/2 and 5 = ae/4. As in the proof of Theorem 2.7, we can show that 

P(vrS lin n > a and > a) > 0. 

Moreover, we can find an increasing sequence of tight sets T n CV(E) such 
that 



P(7r™ n G T n and vr^ ax G %) ™ 1, 
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as V(V{E)) is Polish. Therefore, we can choose T sufficiently large such that 

P« in G S and iff" G S) > 0. 
The remainder of the proof is identical to that of Theorem 2.7. □ 

Lemma 6.2. Condition (B) implies Condition (C). 

Proof. Suppose that Condition (B) holds. We claim that Condition (C) 
holds with g = 2/3, S = Tn \v G : ^ > A^/2}, and O = {(yi,. . . ,2/jv)}, 
provided that T C V{E) is chosen sufficiently large. Indeed, as the family 
X = {M G V{V{E)) : J vM (dv) = A} is tight (e.g., [7]) it is easily seen that 

M (Tn {v G V(E):u X0 > X xo /2}) > A zo /3 for all M G M 

is satisfied for every sufficiently large tight set 7~C V{E). Moreover, 

P(i min e5and vr^ ax G5)>0 

when T is chosen sufficiently large, as is shown in the proof of Lemma 6.1. It 
remains to note that as b~ XQ M(y{) ■ ■ ■ M(yj^)l > 0, we have /xM(yi) • • • M(yjy)l > 
for all fi G S. The remainder of Condition (C) now follows immediately. 

□ 

Though Condition (Bl) is strongly reminiscent of Condition (KR), we did 
not succeed in extending the proof of the necessity of Condition (KR) to the 
countable case. Whether Conditions (Bl), (B) or some variant of thereof are 
necessary and sufficient for unique ergodicity in the countable case remains 
an open problem. 

6.2. Nondegeneracy and observability. Conditions of a rather different 
kind than are considered by Kaijser, Kochman and Reeds relate to the fil- 
ter stability problem (see the survey [5]). By Theorem 3.1, however, filter 
stability and unique ergodicity are essentially equivalent, so that also these 
conditions can be brought to bear on the problem considered in this paper. 
In this section, we consider the following conditions that are borrowed from 
[14, 15]: nondegeneracy [Condition (N)], uniform observability [Condition 
(UO)] and observability [Condition (O)]. 

Condition (N). If i,j £E and P^- > 0, then Mij(y) > for all y£F. 

Condition (UO). For every e > 0, there is a 5 > such that 
llP^lx-v -T?"\ t y \\<5 implies [|//-z/||<g [for any u, v G V(E)\. 

l,oo l,oc 

Condition (O). If /i, v G V(E) and P^jri- =P 1/ |jrv ,then^ = zA 
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Theorem 6.3. Suppose that one of the following holds: 

• Condition (N) holds, and the signal transition matrix P is aperiodic; or 

• Condition (UO) holds; or 

• Condition (O) holds, and E is a finite set. 

Then the filter admits a unique invariant measure M, andn -1 X^fc=i Mol~l fc => 
M as n — ^ oo for any Mq G V(V(E)). If, in addition, the signal transition 
matrix P is aperiodic, then we have Mol~l n => M as n — >• oo for any Mq G 
V(V(E)). 

Sketch of proof. First, suppose that Condition (N) holds and that 
P is aperiodic. Consider the stochastic process (X n , Y n ) ne ^ defined as 

X n = (X n ,X n+ i) € E, Y n = Y n+ \ G F, 

where E = {x G E 2 :P(X.q = x) > 0}. Then (X n ,,Y n ) ng z is a stationary 
Markov chain, (X n ) ng 2 is an irreducible and aperiodic Markov chain, and 
(Y n ) ne z are conditionally independent given (X n ) n6 g. Moreover, 

P(Y n eA\(X k ) keZ )= [ M XnXn+l {y)y{dy) := / r(X n ,y)<p(dy), 
J A J A 

where T(x,y) > for all x G E and y G F by Condition (N). Therefore, 
||P^(X n G -|Y , . . . , Y n ) - P"(X n G -|Y , . . . , Y n )|| n -=3? P M -a.s. 

for all fx, v G V(E) by [14], Corollary 5.5. It follows immediately that ||7Tn — 
7r^|| — > as n — > oo P^-a.s. The proof is completed by invoking Theorem 3.1. 
Next, suppose Condition (UO) holds. By a result of Blackwell and Dubins 

[3], 

||P M ((nWn G "I^J - P v ((n)*>„ e .p?n)ll "±3° P"-a.s. 
whenever fi<^.u. But one can show (e.g., [15]) that 

P p ({Y k ) k >n 6 -l-Tfj = P^((n)fc>0 G = P^ljrr for all p G V(E). 

1 l,oo 

Using Condition (UO), it therefore follows that \\iin — ir^W — > as n — > oo 
P M -a.s. whenever [i^v. The proof is completed by invoking Theorem 3.1. 

Finally, suppose that E is finite, and that Condition (O) holds. Then 
it is not difficult to establish, along the lines of [15], Proposition 3.5, that 
Condition (UO) is satisfied. The result therefore follows as above. □ 

Remark 6.4. When the signal transition kernel P is periodic, Condition 
(N) by itself does not ensure unique ergodicity of the filter (this can be seen, 
e.g., by considering the example of a periodic signal in E = {1,2} with the 
trivial observation state space F = {1}). However, if Condition (N) holds and 
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E is a finite set, a detectability condition [which is weaker than Condition 
(O)] is necessary and sufficient for stability of the filter, and hence for unique 
ergodicity. The necessary arguments can be adapted from [13], Section 6.2, 
with some care. As this is somewhat outside the scope of this paper, we omit 
the details. 



It should be noted that none of the conditions of Theorem 6.3 are nec- 
essary. Indeed, Condition (N) is not satisfied by the examples given by 
Kochman and Reeds [10]. That Condition (UO) [hence Condition (O)] is not 
necessary can be seen from the trivial counterexample, where P is aperiodic 
and F = {1}. In this case the observations are completely noninformative, 
so that the point mass at A € V(E) is the unique invariant measure for the 
filter, but Condition (UO) is not satisfied. 

Nonetheless, the sufficient conditions of Theorem 6.3 can be useful in 
practice, as they may be substantially easier to check than Condition (C) 
or (KR). For example, in the case where E is a finite set, verifying Condi- 
tion (O) is simply a matter of linear algebra (see [5] for an example), while 
verifying Condition (KR) involves taking limits. Moreover, despite that Con- 
ditions (C) and (KR) are both necessary and sufficient in many cases, we did 
not succeed in our attempt to prove Theorem 6.3 by directly verifying that 
Condition (C) or (KR) hold. Therefore, such sufficient but not necessary 
conditions remain of independent interest. 

APPENDIX: SUPPLEMENTARY PROOFS 
A.l. Proof of Lemmas 2.3 and 2.4. We will need the following. 

Lemma A. 1. ir^' m = vr™ in o n ~ m and vr™ ax = vr™ ax o Q n ~ m for m, n G Z. 

Proof. By stationarity of P, it is easily seen that 

E(/(x n ,)|^„) = v{f{x m )\jr_ l>m ) o e n ~ m , 

~^{f { X n)\^n-l,n V ^n-l,n-k) = E (/(^"m) \^m-l,m V ^m-£,m-k) ° G^"™"- 

The result follows by letting (. — > oo, then k —¥ oo. □ 



We begin by proving Lemma 2.3 for the case irH. It clearly suffices to 
prove 

P»-a.,fora]l„>l. 



liM(Y 1 )--M(Y„)l 
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Let / G C b (E) and A G B(F n ). Then 

E u( »M(Yi)-M(Y n )f \ 

^(tt) • • • M ^)f^ M{yi) . . . M (y n )Md yi ) • • • ^dy n ) 



nM(yi)---M{y n )l 
/xM(yi) • • • M{y n )fip{d yi ) ■ ■ ■ <p(dy n ) 

= W(I A (Y 1 ,...,Y n )f(X n )). 

As this holds for any / G C^E) and A G B(F n ), the above expression for 7r^ 
follows from the definition of the conditional expectation. 
To prove Lemma 2.3 for 7r™ m , let k, n> 1. Note that 

E(/(X n )|J-r fc+1 , n ) = 7r n+fc /oG- fc 

= AM(y_ fc+1 )---M(r n )/ 
AM(y_ fc+ i)---M(y n )i 

= (ir k oe- k )M(Y 1 )---M(Y n )f 
^ k oQ^ )M{ Y l )...M(Y n )\- 

But E(/(JQ| JT fc+1>n ) -> and 7r fc /o9- fc = E(/(X„)| ^+i >0 ) "> 

as — )• oo P-a.s. by the martingale convergence theorem. Therefore 

rnin_ ^M^i) ■ • • M(Y n ) 



TT™*M{Y 1 )---M{Y n )r 



P-a.s. for all n > 1, 



and the result follows for arbitrary m,n G Z, n > m by Lemma A.l. 
To prove Lemma 2.3 for vr™ ax , let n > 1 and k > £ > 0. Note that 

E(/(X n ) |^r fc , n V T\_ e ) = E(f(X n+e ) \Tl k+e , n+e V T* k+e>0 ) o 0"' 

= E(/(x n+ ,)|^ n+ , v ^{Xo}) o e^, 

where we have used the Markov property. Moreover, it is easily seen that 

E(/ (X n+i ) \T hn+£ V a{X }) = n n+e f = 6xoM{Yi) ... M{Yn+e)1 - 
Therefore, we can write 



E(/(^)|^, n VJ; 



(7rf°oe-QM(ri)---M(y w )/ 

(Tr?°oG-t)M(Y 1 )---M(Y n )l' 
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Letting k — > oo, then I — > oo and applying the martingale convergence the- 
orem, we obtain the desired recursion for vr™ ax . 

We now turn to the proof of Lemma 2.4. The stationarity of (7r™ ax ) ne z 
and (vr™ in ) ne z follows directly from Lemma A.l and the stationarity of 
(X n , Y n )n£z- It only remains to prove the Markov property. For / £ Cb(V(E)), 
we can compute 



7T^M(Y n 



+1 



7r max M (y n+1 ) 1 

E ( / (^Sr)r— ) 



where we have used Lemma 2.3 and the fact that 7r™ ax is ^oo.n-measurable. 
But for any bounded measurable function g : F — > M, we have 

E(g(Y n+1 )\g^ n ) = j g{y)^M{y)Mdy). 

The Markov property and the expression for the transition kernel l~l follows 
immediately. The Markov property of ir" im and TTn follows along similar 
lines. 

A. 2. Proof of Theorem 3.1 and Lemma 3.4. The proof of Theorem 3.1 
follows closely along the lines of [6, 11, 12]. We will sketch the necessary 
arguments, concentrating on the special features of the countable setting. 

We begin by establishing the Feller property. 

Lemma A. 2. Let (/i n )neN C V{E) and fi € V(E) be such that fj, n [i. 
Then f f{y)U^ n ,du)^ I f(vMn,dv) for every f G C b (V(E)). 

Proof. Let N C F be a y-null set such that sup^ Ylj My(y) < oo for all 
y £N. Then ^ n M(y)l -> fiM(y)l for all y £ N, and /j, n M(y)/ fj, n M (y) 1 
fj,M(y)/ fj,M(y)l whenever y N and juM(j/)1 > 0. It follows that 

But the family {f(n n M(y)//i n M(y)l)fj, n M(y)l:n G N} is uniformly inte- 
grate (under (p), as \f(n n M(y)/n n M(y)l)/i n M(y)l\ < \\f\\oolinM(y)l and 
by Scheffe's lemma f \fi n M(y)l — fiM(y)l\ip(dy) —¥ 0. The result therefore 
follows from the expression for l~l in Lemma 2.4. □ 



We will need some basic elements from Choquet theory. 
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Definition A. 3. Let S be Polish. For M, M' G V(V(S)) we write M -< W 

if 

J f{u)U{du)< j f{u)U\du) for every convex / G C b (V(S)). 
For any M G V(V(S)), the barycenter b(M) G is defined as 

6(M)ti = y vu\A(dv) for all u G C b (S). 
For any // G V(S), define m M , G V(V(S)) as 

f{u)m tl {dv) = f{yL), j f(v)m ll (du) = J f(S x )fj,(dx) 
for every feC b (V(S)). 

Lemma A. 4. Let S be a Polish space. The following hold: 

(1) Given M £V(V(S)), we have f(b(M)) < J f{y)M(dv) for every convex 
function f G Cb(V(S)) (Jensen's inequality). 

(2) For any M G V(V(S)), we have m b(M) -< M -< rh b(M) . 

(3) // M, M' G V(V(S)), M ^ M' and M' -< M, we have M = M'. 

In particular, -< defines a partial order on V{V{S)). 

Proof. Jensen's inequality is proved as in [11], Lemma 3.1. The second 
property follows easily from Jensen's inequality. The third property follows 
from the fact that the family of convex functions in Cb(V(S)) is a measure 
determining class (see, e.g., Proposition Al in [12]). □ 

We now need some basic convexity properties of the filter. 

Lemma A. 5. The following hold for any M G V(V(E)): 

(1) If f G Cb(V (E)) is convex, then Uf G Cb(V(E)) is also convex. 

(2) b(Mn) = 6(M)P. 

(3) // M n = M , then 6(M) = A. 

(4) m b(M)P „n m -< Mn m+n -< m b(M)pn r\ m for any m,n > 0. 

Proof. The first claim follows as in [11], Lemma 3.2. The second claim 
follows directly from Lemma 2.4. The third claim follows from the second 
claim and the fact that A is the unique invariant measure for P. The fourth 
claim follows from the first and second claims, together with the second 
claim of Lemma A. 4. □ 

The following lemma connects vtq 1111 , 7TQ nax to the filter transition kernel fl. 
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Lemma A. 6. Denote by M max , M min e V{V{E)) the laws of ir^ x and 
ttq 1 ^, respectively. Then m A rP =4> M min and m A n n M max as n ->■ oo. 

Proof. Let / e C b {V(E)). Then 

m A H"/ = E[/(vr n )] = E[/(P(X € -| Jr n+1 ,o))] ™ E[/(vr min )], 
where we have used stationarity and martingale convergence. Similarly, 
fh x n n f = E[/(tt*°)] = E[/(P(X € -I J-r n+1 , V a{X_ n }))] 

= E[/(P(X € -1^00,0 V J)] ^ E[/K max )], 

where we have add.it ionally used the JVtaxkov property of i K X n ^Y rt ^ n ^ J . □ 

Finally, we will need the following convergence property: 

Lemma A. 7. liim^oo \\ir% — 7r^|| exists P^-a.s. whenever \i <C f . 

Proof. It is not difficult to show along the lines of [14], Corollary 5.7, 
that 

II,*-/ 1 

ll"n "nil 

E"((dti/d»)(X )\?Z n ) 

P^-a.s. whenever fj.^.u. The denominator converges P^-a.s., hence P^-a.s. 
(as /j<y), to a random variable which is strictly positive P^-a.s. 
To prove convergence of the numerator, let e > and define 

M 'n = E ^ (^( X o)Id^/du(X )>e\^Y,n) > 
L n = E^ (-^(X )I dli / du ^x )<e\^n,oo V ^T,oo) > 
^4 = E ^ {^{ X Q)h^/du{X Q )>e\^n,oo V -^ooj • 

Clearly M n and are uniformly integrable martingales, while L n and 
L' n are reverse martingales. Moreover, the numerator can be written as 
E" \Z n \T\ n ) where Z n = \L n + L' n — M n — M' n \. We proceed to estimate 
as follows: 

\E»(Z n - ZoolJfJI < E"(|L n - LoollJfj + E v (|M n - M^H jfj + 4M^. 
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The first two terms converge to zero ~P u -a.s. as n — > oo by Hunt's lemma 
([3], Theorem 2), while lim n _ J . 00 vanishes if we let e — > oo. Therefore 
W(Z n — ZvclTin) — > as n — > oo P^-a.s., and the proof is easily completed. 

□ 

We now proceed to the proof of Theorem 3.1 and Lemma 3.4. 

A.2.1. Proof of Theorem 3.1 (1 <&2). First suppose P^g"** = vr^ 11 ) < 1. 
Then E({(7r^ ax )i - (vr^ in )i} 2 ) > for some i € E. Now note that 

E({(vr max ), - {^%} 2 ) = E({(vr max ) i } 2 ) - E({(vr min )a 2 ) 

= J {i/i} 2 M max (di/) - J {vi} 2 M min (dv), 

so that P(vrS iax = 7rg lin ) < 1 implies M max ^ M min . But M max and M min are 
invariant measures for IH by Lemma 2.4, so we have shown that the filter 
admits two distinct invariant measures. Conversely, if the invariant measure 
is unique, then P(7r™ ax = vr™ m ) = 1. Thus we have proved the implication 
1=^2. 

Now suppose that vr™ ax = vr™ in , so that in particular M max = M min . Let 
M be any invariant measure for We claim that M mm -< M -< M max , so 
that necessarily M = M max = M mm by Lemma A. 4. To prove the claim, note 
that m A n n -< Mn n = M -< m A n n for any n > by Lemmas A.4 and A.5. The 
claim therefore follows directly from Lemma A. 6. Thus we have proved the 
implication 2=>1. 

A. 2. 2. Proof of Theorem 3.1 (2-^3). Proceeding along the same lines 
as in the proof of Lemma A. 6 (and taking into account the fact that weak 
convergence and total variation convergence of probability measures coincide 
when the state space is countable), one can show that 

EdiTT^-^n)^ Eci^-Trr!). 

Suppose first that property 3 holds. Then 

e(||**» - 7r B ||) = WIK - t„||) n ^ o. 

Therefore 7r™ ax = 7r™ m , and we have proved the implication 3 2. 

Conversely, suppose that -/Tq 1 ^ = ttq 1111 . Let /x, v 6 V(E) such that [i^Cv. 
Note that we can write 7r« = E^(7r^-° | J~in)- Therefore, we have 

E"(K " TTnll) = E^(||E^o - vr n | ^J||) < W{\\^ - 7r„||). 
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But E(||7T„° — vr n ||) -4 0, so ||7r„° — vr n || — > in probability. As [i <C A, we find 
that ||vr^° — 7r n || —> in P^-probability also, and by dominated convergence 

E^(|K-7r n ||)<E^(||7r^-7r n ||) n -^0. 

By Lemma A. 7, it follows that \\7Tn — TC n \\ ~ * P M -a.s. Similarly, we find that 
\\ n n ~ n n\\ — > P^-a.s., hence P^-a.s. as fi <C v. Therefore \\iin — ^n\\ ~~ ^ 
P^-a.s., and we have evidently proved the implication 2 3. 

A. 2. 3. Proof of Theorem 3.1 (1 4). The implication 4 => 1 follows im- 
mediately by choosing Mo and M to be distinct invariant measures of the 
filter and applying property 4, which leads to a contradiction. 

To prove the converse implication, choose Mo arbitrarily and define the 
measures M n = n~ l Y2=i M o n ™- N °te that 6(M n ) = n" 1 Y2=i K M o)P n => A 
as the signal is irreducible and positive recurrent. It follows from [7] that the 
sequence (M n ) ng N is tight. It therefore suffices to prove that every convergent 
subsequence has the same limit. But it is easily seen that any convergent 
subsequence converges to an invariant measure of n , so that the result follows 
from the uniqueness of the invariant measure. Thus we have proved the 
implication 1 =>• 4. 

A. 2. 4. Proof of Theorem 3.1 (1 5). The implication 5 1 follows im- 
mediately by choosing Mo and M to be distinct invariant measures of the 
filter and applying property 5, which leads to a contradiction. 

We prove the converse implication under the assumption that the sig- 
nal transition matrix P is aperiodic. Choose Mo arbitrarily, and note that 
6(M n n ) = b(M )P n => A. It follows from [7] that the sequence (M n n ) n , eN 
is tight. It therefore suffices to prove that any convergent subsequence con- 
verges to the unique invariant measure of the filter M. Let n(k) /oo be a 
subsequence such that M rH fc ) =>• M m , and let / G C b (V(E)) be convex. By 
Lemma A. 5, we have 

m fe( M )P"( fc )-™n m / < M rr«/ < m b(Mo)P „ (fe) _ m n m / for all m < n(k). 

In particular, letting k — > oo and using the Feller property, we have 

m A n m /< Moo/^rfiAn™/ forallm>0. 

But letting m — >■ oo and using Lemma A. 6, we find that M mm -< M^ -< M max . 
As the invariant measure M is presumed to be unique, we have M min = 
M max = m by the implication 1 2. Therefore, we find that M m = M by 
Lemma A. 4. This completes the proof of the implication 1 => 5. 
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A. 2. 5. Proof of Lemma 34. First, we note that E((7r Tiax ) i | ) = 
(7rS ain ) i . Therefore, P((7rg lin )i = and (vrjf^); > 0) = for every i £ £. In 
particular, this implies that we have vrQ 1 ^ <C vrg 1111 with unit probability un- 
der P. Now note that vr™ ax = 7r^| A1=7r ma X and vr™ m = 7r^| At=7r min by Lemma 
2.3. Therefore, 



P lim ||7rr x -vrr n || exists ^-^,0 



lim 1 1 vr^f — ir^W exists l^z-o^o 
k— >oo 



'(lim K 



Pn lim 11^ — vr^|| exists 



1 P-a.s., 



where we have used the fact that 7r™ in and 7r™ ax are C/^oo^-measurable in 
the first step, the Markov property in the second step, and Lemma A. 7 in 
the third step. The result now follows by taking the expectation with respect 
to P. 
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